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Abstract 

This paper investigates the bias and the weak Bahadur representation of a local 
polynomial estimator of the conditional quantile function and its derivatives. 
The bias and Bahadur remainder term are studied uniformly with respect to 
the quantile level, the covariates and the smoothing parameter. The order of the 
local polynomial estimator can be higher than the differentiability order of the 
conditional quantile function. Applications of the results deal with global opti- 
mal consistency rates of the local polynomial quantile estimator, performance 
of random bandwidths and estimation of the conditional quantile density func- 
tion. The latter allows to obtain a simple estimator of the conditional quantile 
function of the private values in a first price sealed bids auctions under the 
independent private values paradigm and risk neutrality. 
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1. Introduction 

The conditional quantile function is a powerful tool to represent the dependence between two 
variables. Let Q(a\x), a in (0, 1), be the conditional quantile function of a univariate dependent 
variable Y given X = x, where X is the d dimensional covariate, Q(a\x) = inf{y : ¥(Y < y\X = 
x) > a}. Under fairly general conditions, the Levy-Smirnov-Rosenblatt transformation ensures 
that there is a random variable A independent of X and uniform over [0, 1] such that 

(1.1) Y = Q(A\X). 

In other words, the knowledge of the conditional quantile function allows to compute the impact 
on Y of a shock on X for any given A. The conditional quantile function is also central in 
the identification of the impact of such shocks or of more general parameters in nonseparablc 
models in microeconometrics, see Chesher (2003), Chernozhukov and Hansen (2005), Holder lein 
and Mammen (2007) and Imbens and Newey (2009) to mention just a few. See also Firpo, 
Fortin and Lemieux (2009) or Rothe (2010) for an unconditional point of view when evaluating 
distributional policy effects. Conditional quantile approaches can also be useful in industrial 
organization due to the important role played by increasing functions and the equivariance 
property of quantile function which states that ^f(Q(a\x)) is the conditional quantile function 
of ^f(Y) given X provided is an increasing transformation. See Haile, Hong and Shum (2003), 
Marmer and Shneyerov (2008) and below for the case of auctions. Echenique and Komunjer 
(2009) show the usefulness of a conditional quantile approach when analyzing general multiple 
equilibria economic models. 

However, inference with the quantile representation (jl.ip is potentially difficult due to non- 
separability. In a regression model Y = m{X)+e where X and e are independent, the dependence 
between Y and X is summarized through the regression function m(-) and does not involve the 
unobserved noise e. This contrasts with (|1.1|) where the random variable A may potentially 
change the shape of x \— > Q(A\x). Hence, inference in (|1.1|) should not focus on a particular 
value of the quantile level a but should consider instead all a in an interval \a, a] close enough 
to [0,1], as recommended for instance in the case of the more constrained quantile regression 
model analyzed in Koenker (2005). In practice, this often leads to consider graphical represen- 
tations of the estimated curves x h-> Q(a\x) for various a. A natural norm for evaluating these 
estimated graphs is the uniform norm with respect to a and x, sup a x Q(a\x) — Q(a\x) . 

The present paper contributes to this issue for local polynomial quantile estimators Qh[a\x) 
which depends upon a bandwidth h. We study its bias uniformly in a and x and derive a uniform 
Bahadur representation for Qh{a\x) and its derivatives which holds in probability, that is a 



weak Bahadur representation. In few words, a Bahadur representation is an approximation of 
Qh(ct\x) — Q{a\x) by a bias term plus a leading stochastic term up to remainder term with an 
explicit order. In our setup, uniformity is with respect to the level a, the bandwidth h, and 
the covariate x, implying that our Bahadur representation is an important step for the study of 



sup 



Q(a\x) — Q(a\x) , see Proposition [2] below. Various other interesting results also follow 
from our uniform results. 

To be more specific, consider independent and identically observations (X\, Yi), . . . , (X n , Y n ) 
with the same distribution than (X, Y). Define, for a in (0, 1), the loss function 

(1.2) £ a (q) = \q\ + (2a - l)q = 2q(a- l(q < 0)) , q in R, 
where M stands for the set of real numbers. It is well known that 

(1.3) Q(a\x) = argmmE[£ a (Y - q)\X = x] 



is the conditional quantile of Y given X = x. When d = 1, the local polynomial estimator of 
order p of Q(a\x) is Qh(a\x) = bo(a; h, x) where, for b = (bo, . . . , b p ) T ', 

(1.4) h(a; h,x) = arg min f l a ( Yi - b - h (X; - x) ^ (X { - x) p ) K 

i=i v 

In the expression above, p\ is the factorial p x (p— 1) x • • • x 1, K(-) is a kernel function and h is a 

smoothing parameter which goes to with the sample size. As detailed in Section 2 and studied 

throughout the paper, the local polynomial estimator Qh(a\x) has a natural extension which 

covers the multivariate case d > 1. As noted in Fan and Gijbels (1996, Chapter 5), the local 

polynomial estimator Qh(a\x) is a modification of the Least Squares local polynomial estimator 

of a regression function which uses the square loss function in (|1.4p instead of the loss function 

£ a (-). A Taylor expansion 

„, , . dQ(a\x) . 1 d p Q(a\x) .„ 

Q(a\X l ) ~ Q(a\x) + (X, - x) + • • • + - ^ (X t - xf 

suggests that b\(a; h, x), . . . , b p (a; h, x) estimate the partial derivatives d r Q(a\x)/dx r , r = 1, . . . ,p, 
provided Q(a\x) is smooth enough. 

Robust local polynomial estimation of a regression function and its derivatives, including 
quantile methods, has already been considered in many research articles. See in particular Tsy- 
bakov (1986) for optimal pointwise consistency rates, Fan (1992) for design adaptation, and 
Fan and Gijbels (1996) and Loader (1999) for a general overview. The present paper is perhaps 
more specifically related to Truong (1989), Chauduri (1991), Holderlein and Mammen (2009) 
and Kong, Linton and Xia (2010). Truong (1989) showed that local median estimators achieve 



3 

the global optimal rates of Stone (1982) with respect to L m norms, < m < oo, for conditional 
quantile function satisfying a Lipschitz condition. Chauduri (1991) obtained a strong (that is 
which holds in an almost sure sense) Bahadur representation for the local polynomial quantile 
estimators when the kernel function K (•) of (jl.4p is uniform. Hong (2003) extended this result 
to local polynomial robust M-estimation and more general kernels. The Bahadur representa- 
tion of Chaudhuri (1991) is pointwise, that is holds for some prescribed x and a and a given 
deterministic bandwidth h — > 0. As explained and illustrated in Kong et al. (2010), pointwise 
Bahadur representations are not sufficient for many applications including plug in estimation 
of conditional quantile functionals or marginal integration estimators. Hence Kong et al. (2010) 
derives a strong uniform Bahadur representation for robust local polynomial M-estimators for 
dependent observations. Here uniformity is with respect to the location variable x. For local 
polynomial quantile estimators of order p = 1, Holder lein and Mammen (2009) considers uni- 
formity with respect to a and x but they just show that their remainder term is negligible in 
probability and does not obtain its order. 

In this work, we study the bias term and obtain the order in probability of the Bahadur 
remainder term uniformly in a, h and x for local polynomial quantile estimators. A first contri- 
bution given in Theorem [1] below deals with the study of the bias of local polynomial quantile 
estimators. Most of the literature has focused on the case where the order p of the local polyno- 
mial is equal to the order of differentiability of x t— > Q(a\x), say s. This is somehow unrealistic 
since it amounts to assume that s is known. Since the case where p < s can be easily dealt with 
by ignoring higher order derivatives, we focus in the more interesting case where p > s, which 
has apparently not been considered in the statistical and econometric literature. As shown in 
Corollary [H a local polynomial quantile estimator with p > s still allows to estimate Q(a\x) 
with the optimal rate n~ s ^ 2s+d ^ of Stone (1982). This suggests that local polynomial estimators 
using high order p should be preferred since they allow to estimate in an optimal way a wider 
range of smooth conditional quantile functions. Another interesting conclusion of our bias study 
is that the additional local polynomial coefficients b v (a; h,x), v = s + 1, . . . ,p can diverge and 
Proposition Q] describes a simple example where it indeed happens. Hence, in the local polyno- 
mial setup, a high value of b v (a; h, x) may also correspond to a non smooth quantile function in 
which case a lower degree p < v could have been used. 

Our uniform study of the Bahadur remainder term, namely Theorem [51 is the second main 
contribution of the paper. A third contribution builds on the fact that Theorems [T] and [2] hold 
uniformly with respect to x in a compact inner subset of the support of X. Combining these 
results with a study of the stochastic part of the Bahadur representation allows us to show that 



the local polynomial quantile estimator achieves the global optimal rates of Stone (1982) for the 
L m and uniform norms provided the bandwidth goes to with an appropriate rate. This result, 
stated in Corollary[TJ is apparently new and extends Truong (1989) which is restricted to Lipshitz 
quantile functions, or Chauduri (1991) who considers pointwise optimality. A fourth contribution 
uses the fact that Theorems Q] and [2] hold uniformly with respect to h in an interval [h, h] . 
Proposition [2] shows that a random bandwidth performs as well as its deterministic equivalent 
counterpart with respect to convergence rates of the uniform norm sup Q x Qh{d\x) — Q(a\x) . 
Such a result gives a solid theoretical basis to Li and Racine (2008) suggestion of choosing 
the local polynomial bandwidth h via a simpler cross validation procedure for the conditional 
cumulative distribution function. As mentioned earlier, uniformity with respect to a and x is 
also useful for graphical representations of (jl.ip . 

A fifth contribution also exploits uniformity with respect to the quantile order a. Proposi- 
tion [3] considers estimation of the conditional quantile density function 

n k\ t \ \ 9Q{a\x) 1 

(L5) q(alx) = ~^T~ = f(Q(a\x)\xY 

As argued in Parzen (1979), the quantile density function q{a\x) or its inverse l/q(a\x) is a 

renormalization of the density function f{y\x) which is well suited for statistical explanatory 

analysis. The function q(ot\x) is also crucial for quantile based statistical inference. Indeed, the 

asymptotic variance of Qh(ot\x) is proportional to 

1 a(l — a) 
nh q 2 (a\x)f(x) 

where /(•) is the marginal density of X, see Fan and Gijbels (1996, p. 202). Hence estimating 
q(a\x) is useful to estimate the variance of Qh{®\x). As noted in Guerre, Perrigne and Vuong 
(2009), the conditional quantile density function plays an important role in the identification of 
first-price sealed bids auction models. Under the independent private values paradigm and risk 
neutrality, the conditional quantile function of the private values Q v (a\x) satisfies 

™, i \ r<h, i \ aq b (a\x) 
Q v (a\x) = Q\a\x) + j_[ , 

where Q b (a\x) and q b (a\x) are the conditional quantile function and quantile density function 
of the bids. Hence estimating Q b (a\x) and q b (a\x) gives a straightforward estimation of the 
conditional quantile function of the private values Q v {a\x) which is an alternative to the two 
steps approach of Guerre, Perrigne and Vuong (2000). See Haile et al. (2003) or Marmer and 
Shneyerov (2008) for a related estimation strategy. 



There is however just a few references that address the estimation of q(a\x). For the related 
function q(a\x)dF(Q(a\x)\x)dx, Lee and Lee (2008) uses a composition approach which non- 
parametrically estimates dF(y\x)/dx, f(y\x) and Q(a\x) = F~ 1 (a\x). Haile et al. (2003) and 
Mariner and Shneyerov (2008) proceeds similarly. Xiang (1995) proposes the estimator 



— / F 1 (a + h q a\x) dK q (a), 

J 



where F(y\x) is a kernel estimator of the conditional cumulative distribution function, K q {-) a 
probability distribution and h q a smoothing parameter. As argued in Fan and Gijbels (1996), 
local polynomial estimators may have better design adaptation properties than kernel ones. 
Hence we propose to use the local polynomial Qh{ot\x) instead of the kernel F~ 1 (a\x). Thanks 
to uniformity with respect to a in Theorems [1] and [21 the resulting conditional quantile density 
function estimator q{a\x) has a simple Bahadur representation which facilitates the study of its 
consistency rate, see Proposition [3l 

The rest of the paper is organized as follows. The next section groups our main assumptions 
and notations and explained in particular how to extend (jl.4p to multivariate covariates. Section 
3 exposes our main results and Section 4 concludes the paper. The proofs of our statements are 
gathered in two appendices. 

2. Main assumptions and notations 

The definition (jl.4p of Qh(a\x) assumes that the covariate X is univariate. In the mul- 
tivariate case, we use a multivariate kernel function K(z) = K(zi,...,Zd) but we restrict 
to an univariate bandwidth for the sake of simplicity. The univariate polynomial expansion 
^o + bi {X{ — x) + • • • + bp (Xi — x) p /p\ is replaced by a multivariate counterpart as defined now. 
Let N be the set of natural integer numbers. For v = [v i, . . . , Vd) let |v| = v\ + ■ ■ ■ + and 
let P be the number of v's with |v| < p. Then a generic expression for multivariate polynomial 
function of order p is, for b in R p , 

U(z) T b = & v^, where z v = x • • • x z v /, U(z) T = |v| < p\ , 

v;|v|<p V ' \ V - / 

and v! = Ilf =1 Vi\. In the expression above, the vectors v of N d are ordered according to the 
lexicographic order. The multivariate version of the local polynomial estimator (jl.4p is 

(2.1) h(a;h, x) = arg min C n (b; a, h, x) with 

1 n ( X \ 

C n (b;a,h,x) = —^^(Y.-U^-xfb) W^-^J . 

i=i 



As in the univariate case, the entry bo(a; h, x) = Qh(a\x) of b(a; h, x) is an estimator of Q(a\x). 

The entry 6 v (q; h,x) can be viewed as an estimator of the partial derivative 

_ d^Q{a\x) 
bv{alX) ~ dx? x-..xdx v / 

provided this partial derivative exists. We shall consider later on the following Holder class. 

Consider a subset [a, a] of (0, 1) over which Q(a\x) or its partial derivatives will be estimated. 

Let [s\ be the lowest integer part of s, i.e. [s\ is the unique integer number with [s\ < s < \_s\ +1. 

Then Q(-\-) is in C(L, s), L, s > 0, if 

(i) for all q in [a, a], x H> Q(a\x) is [sj-th continuously differentiable over the support X 
of X; 

(ii) for all v in N d with |v| = [s\, all a in [a, a], all x, x' in Af, 

|6 V (a\x) — b v (a\x r ) | < L \\x — x'\\ s ^ 

where || • || stands for the Euclidean norm. 

Since the estimators b v (a; h,x) of the partial derivatives b v (a\x) converge with different rates, 
we use the diagonal standardization matrix 

H = H(h) = Diag (ftJ v| , v G N d , |v| < pj . 

It is well known that local polynomial estimation techniques apply at the boundaries. 
However we will focus on those x which are in an inner subset Xq of the support X of X 
to avoid technicalities. Our main assumptions are as follows. Let B (0, 1) be the closed unit ball 
{z e R d : \\z\\ < l}. 

Assumption X. The distribution of X has a probability density function /(•) with respect to 
the Lebesgue measure, which is strictly positive and continuously differentiable over the compact 
support X of X. The set Xq is a compact subset of the interior of X. 

Assumption F. The cumulative distribution function F(-\-) of Y given X has a continuous 
probability density function f(y\x) with respect to the Lebesgue measure, which is strictly positive 
for y in R and x in X. The partial derivative dF(y\x)/dx is continuous over R x X . There is 
a Lq > 0, such that 

\f(y\x) - f(y'\x')\ < L \\(x,y) - (x',y')\\ for all (x,y), (x',y r ) ofXxR. 



Assumption K. The nonnegative kernel function K(-) is Lipschitz overR. d , has a compact sup- 
port K, and satisfies J K(z)dz = 1. For some K_ > 0, K(z) > K_ I(z G B (0, 1)). The bandwidth 
is in \h n , h n ] with < h n < h n < oo, linin^oo h n = and lim„_ s>00 (logn)/(n/i^) = 0. 



Assumption[X]is standard. Assumption[F]ensures uniqueness of the conditional quantile Q(a\x) = 
F^ 1 (a\x) in (|1.3|) and existence of the quantile density function (|1.5|) . Assumption iKl allows for a 
wide range of smoothing parameters h — > in [h n , h n ]. In the univariate case d = 1, Hong (2003) 
restricts to bandwidths h = 0(n -1 /( 2p+3 )), a condition which is not imposed here, and Chauduri 
assumes that h has the exact order n~ l /^ 2p+d \ In the simpler context of univariate kernel re- 
gression, Einmahl and Mason (2005) assumes h d > C(logn)/n to obtain uniform consistency so 
that Assumption [Kl is fairly general. 

3. Bias study and Bahadur representation 

Applying standard parametric M-estimation theory as detailed in White (1994) or van der 
Vaart (1998) suggests that the local polynomial estimator b(a; h, x) of (|2.ip is an estimator of 
b*(a; h, x) with 



(3.1) 



h*(a;h,x) = arg min E 



V -U(.Y x fb)K(^- 



In particular, Q* h (a\x) = 6q(«;/i, x) may differ from the true conditional quantile Q{a\x) due 
to a bias term Q* h (a\x) — Q(a\x). Studying this bias term can be done using the first-order 
condition 



d 



E 



Y - U (A - x) T b*(o; h, x) ) K 



X -x 
h 



0, 



and the Implicit Functions Theorem. This approach gives in particular the order of the differ- 
ence between 6* (a; h,x) and the vth partial derivative 6 V (a\x) of Q{a\x) provided the partial 
derivative exists. 



Theorem 1. Assume that Q(-\-) is in a Holder class C(L,s) with [s\ < p. Then under As- 
sumptions^ [2 and [2 and provided h is small enough, there is a constant C such that for all 
|v| < [s\ and n large enough, 

b v (a\x) 



sup 

(a,h,x)£[a,a] X [h,h]xXo 



6* (a; h, x) 



h s 



< CL. 



It follows that Q*(a\x) — Q(a\x) = 0(h s ) and more generally that 

6* (a; h, x) — b v (a\x) = O ^/i s ~' v '^ 



uniformly provided |v| < [s\. Since [s\ < p, the bias order /i s_ l v is not affected by the order p 
of the local polynomial estimator. This bias order is better than the bias order /i p- ' v ', |v| < p, 
that would be achieved by suboptimal local polynomial estimators of lower order p < [s\. 

The proof of Theorem Q] establishes a slightly stronger result since it also gives the order of 
the coefficients b^(a;h,x) with |v| > \_s\ which correspond to partial derivatives that may not 
exist. Indeed, equation (|A.8|) of the proof of Theorem [1] implies that 

(3.2) b*(a;h,x) = O (V Hv| ) for |v| > s 

uniformly in (a,h,x) G [a, a] x [h,h] x Xq. See also Loader (1999, Theorem 4.2) which gives 
a less precise 6* (a; h,x) = o (/i~' v '). Hence the higher order polynomial coefficients b^(a; h,x), 
|v| > s, may diverge when h > 0. That this may be indeed the case can be seen on a simple 
regression example. Consider 

' \x\ 1/2 ifx>0 
|x| 1,/2 if x < 

where the IA ([—1,1]) random variable X and the TV (0,1) e are independent. Let $(•) be the 
cumulative distribution function of the standard normal A/"(0, 1). In this example, Q(a\x) = 
<I )_1 (a) + m(x) inherits of the smoothness properties of the regression function m(-). Note that 
the differential of m(-) at x = is infinite. It also follows that Q(a\x) is at best in an Holder 
class C(L, 1/2) since, for L large enough, 

\m(x) — m(x')\ < L \x — x r \ for all (x, x') € [—1, l] 2 , 

an inequality that cannot be improved by increasing the exponent 1/2 as seen by taking x = 
and x' — > 0. The next Proposition uses the behavior of m(-) at x = to show that the rate given 
in (|3.2p is sharp. 



(3.3) Y = m(X)+e, m(x) = < 



Proposition 1. Suppose that (X, Y) satisfies Let b*(a; h, x) = (b^a; h, x), 6J(a; h, x)) T 

from §3.1\) be given by a local polynomial procedure of order 1. Then under Assumption\^ and 
J zK(z)dz = 0, 6^(0.5; h,0) = m(0) + O(/i 1 / 2 ) and 6^(0.5; h, 0) diverges with the exact rate h' 1 / 2 , 

/i^0 J Z Z A(ZJCIZ 



The divergence of 6^(0.5; /i, 0) implies that the estimator 6i(0.5; h, 0) will diverge in probability. 
This recalls that observing a large bi(0.5;h, 0) is not an argument for claiming that a local 
polynomial estimator of order p = 1 should be used. 



We now consider the stochastic terms Qh(ot\x) — Q* h {a\x) and the rescaled 

H ( b(a; h, x) — b*(a; h,x)) . 



Let us first introduce some additional notations. Local polynomial estimation builds on a order 
p Taylor expansion of Q(a\x') with x' in the vicinity of x. This Taylor expansion can be written 
as Q(a\x') ~ XJ(x' — x) T h p {a\x) where h p (a\x) groups the partial derivatives of Q(a\x) with 
respect to x. Consider the following counterpart of the Taylor approximation, 

(3.4) Q*(x';a, h, x) = V(x' - x) T h*{a, h, x) 

Define also Sj(a; h, x) = S(Xi, Yf, a, h, x) and Jj(a; h, x) = J(JQ; a, h, x) with 

(3.5) Si(a; h,x)=2 {I (Y< < Q*(X f , a, h, x)) - a} U ( ^f-^ K fX ' ' 



h \ h 



X; — X \ ( X;, — X \ T ( X; — X 



(3.6) Ji(a; h, x) = 2/ (Q*(Xf, a, h, x) \Xi)XJ ( ) U [ j , 

Since 

U(X 4 -x) = HU 

and (|1.2p gives 



^-u^-xy b k 



= 2{ji(Y i <u(x,-x) T b) -a}u(Xi-x)irr^^^ 

almost everywhere, the variables Sj(a;/i,x) satisfy 

(b*(a, h, x); a, h, x) = — ^ ^ Si (a; h, x) 

i=i 

almost everywhere. Hence Y^a=i h, x) can be viewed as a score function term whereas 
Yl7=i *J«( a ; h, x) is actually similar to a second derivative of the objective function C n although 
it is not twice differentiable. Indeed, it can be shown that it admits a quadratic approximation 
with second-order derivatives 

H. 
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Classical results of White (1994) or van der Vaart (1998) for parametric estimation suggests 
that a candidate approximation for b(a; h, x) — b*(a; h, x) is 

/ 1 n X- 1 1 n 

V i=l / i=l 

Hence the rescaled (n/i rf ) 12 H (b(a; /i, x) — b*(a; /i, x) ) is expected to be close to 



/ 1 n \ 1 n 

(3.7) f3 n (a; h, x) = - — ^ ^ Ji(a; /i, x) V" Si (a; /t, x). 

\ nn i=i / (^ d ) i=i 

Jj (a; /i, x)/ (nh d ) is similar to a Kernel regression estimator and obeys a Law of Large Num- 

i=l 

bers for triangular array which ensures that this matrix is asymptotically close to 

2f(Q*(x;a,h,x)\x) J U (t) U (t) T K (t) dt. 
Since this matrix is symmetric positive definite, the inverse in f)3. T[) exists with a probability 

n 

tending to 1. The term S,,(q; h, x) /(nh d ) 1 ^ 2 has a similar kernel structure but with centered 

i=l 

n 

Sj(a;/i, x), see (jA.ip in Lemma IA.1I of Appendix A. Hence Si (a; h, x) / {nh d ) l l 2 satisfies a 

i=i 

pointwise Central Limit Theorem, as f3 n (a;h,x). Hence (n/i d ) 1//2 H (h(a; h, x) — h*(a; h, x)J 
should also be asymptotically Gaussian provided the so called Bahadur error term 

(3.8) E n (a; h,x) = (nh d ^j 1 H (b(a; h, x) — b*(a; h, x) \ — f3 n (a; h, x). 

is asymptotically negligible pointwisely. But transposing the various uniform results established 



in the Appendices for the leading term /3 n (a; h, x) of the expansion of (nh d ) ' U[h{a;h,x) -h*( 
requests a uniform study of E ra (a; h, x). 

Techniques to study E n (a;h,x) for a fixed argument a, h and x are given in Hjort and 
Pollard (1993). See also Fan, Heckman and Wand (1995, p. 143) or Fan and Gijbels (1996, 
p. 210). In our uniform setup, obtaining an uniform order for E n (a;/i, x) is performed using a 
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preliminary uniform study of a stochastic process we introduce now. Define first 

L, ln ((3;a,h,x) 

= nh d < C n [ b*(a; h, x) H a, h, x ] — C n (b*(a; h, x); a, h, x) > 



i=i 



( u ( Xjz^ J 



V, - Q ■ ! X,: n . /,. ,• ) \ \ j 2 /? | - 4 (Y< - Q*(X i; a, h, x))\ K ' A ' ' 



V 



(n/i d ) 



which is such that 



nh J H [h{a; h, x) — b*(a; h, x)j = argrninLi„(/3; a, h, x). 
It then follows from f)3 . 8|) that 

E n (a;/i, x) = argminL n (f3 n (a; h, x), e; a; h, x) where 

e 

(3.9) L n e; a; h, x) = h ln (/3 + e; a, h,x) - L ln (/3; a, h,x). 

Hence the stochastic process L n plays a central role in our analysis. Especially useful is the 
decomposition 

L n (/3, e; a; h, x) = L° (/3, e; a; /i, x) + M. n (/3, e; a; h, x) 
where is the quadratic approximation of L n , 



L° (/?, e; a; h, x) = 

(nh d ) 



n / n \ 

^ (a; fc, x) T (/3 + e) + - (/3 + e) T — j £ J,(a; fr, x) (/? + e) 

) i=l V i=l / 

n / n 

J2 site xf/j + W y\ 

J i=l \ i=l 



' Jj(a; /i,x) /3 



n / n \ 

(3.10) = -^^S^Mf^-^U^J^M) (e + 2/3), 

(raft J i=i \ j=i / 

and IRn is a remainder term. As in the expression above (13. 9h for E n (a;/i, x), the variable /3 

above in (13. 1Q[) will be taken equal to {3 n (a;h,x) in the proof of Theorem [2] below. As noted 

in the quadratic approximation lemma of Fan et al. (1995, p. 148) in the pointwise case, the 

order of E ra (a;/i,x) is driven by the order of M n . The proof of the next Theorem relies on an 

uniform study of M n based on a maximal inequality under bracketing entropy conditions from 

Massart (2007), see the proof of Proposition A.l. This maximal inequality plays here the role of 

the Bernstein inequality used in the pointwise framework of Hong (2003). 
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Theorem 2. Under Assumptions\F\ lK\ andlXl 

/log 3 (n)\ 1/4 
sup \\B n (a;h,x)\\=0 F (^ r > ] 



In the case where the lower and upper bandwidths h and h have the same order, Theorem [2] 
gives uniformly in h in [h, h], a and x, 

e^/3 w (q;/t,x) /logn\ 3/4 



Qh(a\x) = Q* h {a\x) H — ' ' +O r , 

where eo is the first vector of the canonical basis of M. p , which first coordinate is equal to 1 and 
the other ones are equal to 0. For h of order 

n -i/(2p+d) as 

studied in Chauduri (1991, Theorem 
3.2), the order of the remainder term is n _3p// ( 2 ^ 2p+d ^ log 3//4 n as found by this author. When 
d = 1, Hong (2003) obtains the better order (log log n/(n/i))~ 3 / 4 but his Bahadur representation 
only holds pointwisely in a and x. It can be conjectured that the order (log n/(n/i rf )) -3 / 4 is 
optimal for Bahadur expansion holding uniformly with respect to x. 
For higher order partial derivatives, Theorem [2] yields 

3/4 

b v (a;h,x) = b*(a;h,x) + + Tl^T : 



e^f3 n (a;h,x) 1 / logn V 
(n^) 1/2 ^|v| /iM P WW 



where the vth entry of e v is 1 and the other are 0, see also Hong (2003) for a pointwise version 
of this expansion and Kong et al. (2010) for a version which is uniform with respect to x. Such 
expansion can be used to study the pointwise asymptotic normality of the local polynomial 
quantile estimator. Combining this Bahadur representation with the bias study of Theorem [1] 
gives a global rate result which is apparently new. The next Corollary extends the study of local 
medians in Truong (1989). 

Corollary 1. Assume that Q(a\x) is in C(L,s) for some [s\ < p. Suppose that Assumptions [71 
lK\ and\X\ hold. Then for all partial derivative order v with |v| < [s\ and all a in [a, a], 

/ ^ m \ 1/m «— |v 

(i) {J Xq b v (a; h,x)) — b v (a\x) dx) = Op (^) 2s+d for any finite m > provided h is 

i 

asymptotically proportional to n 2 «+ d ; 

s-|v| 

(ii) sup xeXo b v (a;h,x) — b v (a\x) = Op (^^^j 2s+i if h is asymptotically proportional to 
i 



Since the b v (a\x) are estimators of the partial derivatives of m(x) in a regression model as (j3.3|) . 
It follows from Stone (1982) that the global rates derived in Corollary [T] are optimal in a minimax 
sense. 



13 



A second application builds on the uniformity with respect to the bandwidth h of our 
Bahadur representation. The next Proposition allows for data-driven bandwidths. Observe that 
it also deals with the uniform norm sup^^g^^^ Qh(a\x) — Q(a\x) which evaluates the 
estimated curves (a,x) i— > Qh(a\x) used in empirical graphic illustrations of (jl.ip . 



Proposition 2. Consider a random bandwidth h n such that h n = Op(h n ) and l/h n = Of(l/h n ) 
where h n is a deterministic sequence satisfying h n = o(l) and lhm^oo (log n)/(n/i^) = 0. Suppose 
that Assumption^ [7] and\^ hold and that Q(a\x) is in C(L, s). Then for any v with |v| < [s\ , 

log n x ' 



sup 

(a,x)&\a,a] xXq 



b v (a; h n , x) — b v (a\x) 



K + 



nh d n 



In particular if the exact order of h n is (log(ra)/n) 1// ( 2s+a! ) in probability, sup xg _^ b v (a; h, x) — b v (a\x) 
has the optimal order (log(n)/n)( s_ ' v ') // ( 2s+<i ) of Corollary [l]-(ii). It is likely that an L m version of 
Proposition [2] holds but it is slightly longer to prove. Proposition [2] can be for instance fruitfully 
applied to cross-validated bandwidths for the conditional cumulative distribution as proposed 
by Li and Racine (2008). 

Our last application builds on the fact that Theorems Q] and [2] hold uniformly with respect 
to the quantile order a. This application concerns estimation of the conditional quantile density 
function (jl.5p . The considered estimator of q(a\x) is a conditional version of the Parzen (1979) 
convolution estimator, 

(3.11) q(a\x) = Y J Qh(a\x)dK q (^j^j = ~t J + h 9 t \ x ) dK i (*) ' 

see also Xiang (1995). In the expression above, h q > is a bandwidth and K q (-) is a signed 

measure over R such that 

J dK q (t) = 0, J tdK q (t) = 1. 
In particular, if K q {-) has a Lebesgue derivative dK q {t) = K' q (t)dt, substituting in A3. 1 If) gives 



q(a\x) = ^- j Qh{a + h q t\x)K' q (t) dt. 



Computing these integrals may request intensive numerical steps so that the resulting estimator 
may be difficult to implement in practice. A more realistic estimator uses a discrete measure 
K q (-) in (|3.11|) . If K q (-) is a linear combination of Dirac masses at tj with weights Kj, j = 1, . . . , J, 
the resulting estimator 

J J J 



j J J J 

q{a\x) = — K jQh{ot + h g tj\x), Kj = and tjKj = 1, 

1 j=i 3=1 3=1 



Id 



may be indeed simpler to compute. Note that this includes the well known numerical derivatives 
Qh(a + h q \x) - Qh(a\x) Q h (a\x) - Q h (a - h q \x) ^ Q h (a + h q \x) - Qh(a - h q \x) 



h n 



h n 



2h n 



To study the bias of q(a\x), we strengthen the definition of the smoothness class C(L,s) as 
follows. Q(a\x) is in C q [L,s) if 

(i) Q[a\x) is in C(L, s + 1); 

(ii) For each x in X, a G [a, a] q(a\x) is [sjth differentiable; 

(iii) For each x in X and all (a, a') S [a, a] 2 

d^- s ^q(a\x) d^ s ^q(a'\x) 



< L \a — a 



l«-W 



da W da W 

We shall assume in addition that K q (-) has a compact support and satisfies the additional 



conditions 



JltdK q (t) = 0, j = 1, . . . , LsJ , J\dK q (t)\< 



oo. 



Proposition 3. Assume that Q(a\x) is in C q (L,s) and [s + lj < p. Suppose that Assumptions 
\K[ \F\ and\^ hold with h = 0(h q ), h q — > and (logn)/(nh d ) — > 0. Then for any x in Xq and a 

in (a, a), 



q(a\x) = q(a\x) + Op \ hi + 



+ 



log 3 / 4 



{nh d h q ) l/2 J {nh d h? q ) l/i \{nh q h d ) 1/2 J ' 



Taking h q and h of the same order is the optimal choice for the order of h in the expansion of 
Proposition [3j This gives 

/ 1 \ log 3 / 4 n / 1 \ 

q(a\x) = q(a\x) + O p I h s + -pj + -jtO f -pr . 

\ (nh d + l ) 1/2 J (nh d + 2 ) 1/A \{nh d + l ) 1/2 J 

The item (log 3/4 nj (nh d+2 ) ~ l,A ¥ ((nh d+i y 1/2 ^J is given by the Bahadur error term E n (a; h, x) 
of Theorem [5J The other item, Op (h s + (n/i rf+1 )~ 1 / 2 ), can be viewed as a bias variance decom- 
position component. The latter is the leading term of the expansion provided nh d+2 — > oo, a 
condition also used in Lee and Lee (2008) when d = 1. In this case, the optimal order for h is 
n -i/(2s+d+i) ^ w hi c ]2 is such that nh d+2 — > oo provided s > 1/2. In this case, the optimal rate for 
pointwise estimation of q(a\x) is 7 1 - s /( 2s + d + 1 ) which, as expected from (jl.5p . coincides with the 
optimal rate for pointwise estimation of f(y\x). 
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4. Final remarks 

This paper has investigated the bias and the Bahadur representation of a local polynomial 
estimator of the conditional quantile function and its derivatives. Compared to the existing lit- 
erature, a distinctive feature is that the bias and Bahadur remainder term are studied uniformly 
with respect to the quantile level, the covariates and the smoothing parameter, extending so 
Chauduri (1991) and Kong et al. (2010). Our framework also considers the case where the order 
of the local polynomial estimator p is higher than the order of differentiability s of the con- 
ditional quantile function. An interesting consequence of our bias study is that using a local 
polynomial estimator of order p > s does not affect its rate optimality. 

Our uniform study of the bias and of the Bahadur remainder term are applied to derive the 
global rate optimality of the local polynomial estimators of the conditional quantile function and 
its derivatives with respect to L rn norms, < m < oo provided the bandwidth goes to with 
an appropriate rate. This extends Truong (1989) who states a similar result for local medians 
and under a rather strong Lipschitz condition for the conditional quantile function. Another 
application deals with the performance of randomly selected bandwidths that are shown to 
perform as well as their deterministic equivalent in term of consistency rates in uniform norm. 
Our framework is flexible enough to be adapted to other global norms. This new result is 
especially useful in view of Li and Racine (2008) suggestion of implementing local polynomial 
quantile estimation with a data-driven bandwidth given by a cross validation criterion for the 
conditional cumulative distribution function. A last application to nonparametric estimation 
of the quantile density function can be useful for confidence intervals and in Econometrics of 
Auctions where the conditional quantile density function plays an important role. 

Our uniform results can also be useful for other studies. For instance an issue far beyond the 
scope of the present paper is the choice of the local polynomial order p. Local polynomial quantile 
estimation can be implemented using a large p, possibly growing with the sample size. This would 
allow to estimate very smooth conditional quantile function with a small bias although it may 
inflate the asymptotic variance of the resulting estimator. Another approach would be to use a 
data-driven local polynomial order p. Such a problem is very close to the issue of choosing the 
order of the kernel when estimating a regression or a probability density function. The latter can 
be addressed following the recent adaptive approach of Goldenshluger and Lespki (2008,2009) 
which gives a data-driven choice of the kernel and bandwidth in the context of the continuous 
time white noise model. Our uniform Bahadur representation is a preliminary step that can be 
useful to extend their results to local polynomial quantile estimation. 
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Appendix A: Proofs of main results 

Appendix A groups the proofs of Theorems [1] and [21 Propositions [TJ [2] and and Corollary [T] The 
proofs of intermediary results used to prove these main results are grouped in Appendix B. 

We first introduce some additional notations. Sequences {a n } and {b n } satisfy a n x b n if \a n \/C < 
\b n \ < C\a n \ for some C > and n large enough. Recall that || • || is the Euclidean norm and B(0, 1) = 
{z; \\z\\ < 1}. Let >- be the usual order for symmetric matrices, that is Ai >- A2 if and only if 
Ai — A2 is a non-negative symmetric matrix. If A is a symmetric matrix, ||A|| = sup ugB ( ^ ||Au|| = 
sup ugB( -g J) |u T Au| is the largest eigenvalue in absolute value of A. This norm is such that ||AB|| < 
||A|j||B|| for any matrix or vector B. Denote by |j-|joo the uniform norm, i.e. ||/(-|*)lloo = su P(z,a)eR d xR \f(y\x)\- 
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We use the abbreviation 8 — (a,h,x). In particular, Q*(x';8), Sj(0) and Jj(0) stand for Q*(x';a,h,x), 
S(Xj, Yf, a, ft, x) and JpQ; a, ft, x), see equations (j3.4[) . (|3.5p and (j3.6[) . We abbreviate ft n and ft„ into ft 
and ft. Define 

6° = [a, a] x [0, ft] x Xq , 9 1 = [a, a] x [ft, ft] x A"o, 

where Xq is as in Assumption IXl and [a,a] C (0, 1) is as in the definition of the smoothness class C{L, s). 
For C n (b; a, ft, x) = C n (b; 0) as in (I2.1[) . define 

£(b;0) = E[£„(b;0)] - ^E 

We also use Kh{z) — K(z/h). It is convenient to change b into its standardization B = Hb and to define 
B(#) = Hb(0) and B*(0) = Hb*(0). Absolute constants are denoted by the generic letter C and may 
vary from line to line. 

The following argument is used systemically. Recall that Xq is an inner subset of the compact X 
under Assumption IXl Hence for any (x, ft) G Xq x /C, x + hz is in X under Assumption [Kl provided ft is 
small enough. 

The next lemma is used in the proof of Theorems [T] and [2J Its proof is given in Appendix B with 
the proof of the other intermediary results. 

Lemma A.l. Under Assumption^ [7?| afidEJ we have for h small enough, 

(i) h*(8) exists and is unique for all 8 in Q . 

(ii) B*(0) = Hb*(0) satisfies 

(A.l) E [Si (6)] = J {f(u(z) t B*(6>)|x + /iz) - F (Q(a\x + hz) \x + ftz)| f(x + hz)\5(z)K(z)dz = 0, 
(A.2) hm sup ||B*(6»)-B*(a;0,x)|| =0, 

where B*(a; 0, x) — (Q(a\x), 0, . . . , 0) T . 

(iii) for all (x', 8i) in X x 1 , i = 1, 2, 

|QV;*i) - QV; M < cft- p (i + ft- 1 ) ||0 X - d 3 || . 

(iv) There exists C such that, for all 8 in O 1 , aZ/ a;' in X and all x in Xq, 

f (Q V; 0)1*0 # {^fpj > ck {^pj . 

A.l. Proof of Theorem [TJ Since Q(-\-) is in C(L, s), the Taylor-Lagrange Formula and Assumption [Kl 
yield that there exists t = t(h, x, z) in (0, 1) such that for ft small enough and all (x, z) in Xq x /C, 

Q(a|x + ftz) = V & vHx) (fez) v + y- (^ (fev(a | T + </ ^ -^(aja:)) 

0<|v|<kl ' M = |a| 

(A.3) = \J{z) T Ub(a\x) +e(8,z). 



[i a (Y -\J{X-xfh) -£ a (Y)}K 



X 
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In the equation above, b v (a\x) is the vth partial derivatives of Q(a\x) with respect to x and b(a|x) 
(b v (a\x),\v\ < LsJ,0,...,0) T e R p - Since e C(L,s), 



(A.4) 



j_im sup 

h-*Q (e,z)ee 1 xic 



e(6,z) 



< CL. 



Let 



1(9, z)= / f(Q(a\x + hz)+t(V(z) T B*(9)-Q(a\x + hz))\x + hz)dt. 
Jo 

Assumptions El KHS Q(-\-) e C{L,s) and (jA~2|) give 
(A.5) 

A Taylor expansion with integral remainder gives 

F (U(z) T B*(6»)|a; + hz) - F (Q(a\x + hz)\x + hz) = (U(z) T B*(6») - Q(a\x + hz)) 1(9, 
Substituting in the first-order condition (jA.ip yields 



hm sup \I(B,z) - f(Q(a\x)\x)\ = 0. 



(A.6) 



U(z) (XJ(z) T B*(9) - Q(a\x + hz)) 1(9, z)f(x + hz)K(z)dz = 0. 



We show that the matrix J U (z) U (z) T I (9, z) f (x + hz) K (z) dz has an inverse. Indeed, Assumptions 
|K1 and IXl (jA.5[) and h small enough give that uniformly in 9 in 0° and A in R p , 

\](z)\J(z) T I(9,z)f(x + hz)K (z)dzA = [ U (zf A 2 I (9,z) f (x + hz) K (z) dz 



(l + o(l))f(Q(a\x)\x) / U(z) T A K(z)dz 



> C\\A\\ 2 , 

T 2 

using the fact that A >— > J U (z) A K(z)dz is a square norm and norm equivalence over R p . It 
follows that J U (z) U (z) T I (9, z) f (x + hz) K (z) dz is strictly positive definite and has an inverse which 
satisfies, for n large enough 

(A.7) sup /U(:SU(-.) / ((J. :)./ \-r + /i:)J\ (;)</: -. x. 

(jXB] and (TA~3j) give 



sup 



Hb*(0) = Hb(a|x) + / U (z)V (zf I (9 , + hz)K(z)dz 
It then follows from (|A.4[) and (|A.7|) that 
||Hb*(6») -Hb(a|x)|| 

(A.8) < C*L/i s 



:(6», z)I(9, z)f(x + hz)\](z)K(z)dz. 



i(9, z)I(9, z)f(x + hz)U(z)K(z)dz 



uniformly in 9 in Q°. This ends the proof of the Theorem and also establishes (j3.2j) since b(a|x) 
(b v (a\x),\v\ < LsJ,0,...,0) T . 



□ 
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A.2. Proof of Proposition [U Let ip(t) = exp(~t 2 /2)/y/2n, $(*) = f ^ Lp(u)du be the p.d.f and c.d.f 
of the standard normal. The regression model p. 31) is such that 

F (y\x) =*(!/- m(x)) , f(x) =I(i£ [-1, 1]) . 

fP|) gives that lim ft _> max zeK |U(z) T B(0.5; h, 0)| = Q(0.5|0) = m(0) = 0. Hence fO|) , (|X5|l and 
Assumption |K] give 

(1 + o(l)) <^(0) y U(z) (U(» T B(0.5; ft, 0) - m (te)) K[z)dz = 0. 
Recall that U(z) = (1, z) T , so that the equation above gives 
&o(G.5;M) 



/i6i(0.5;/i,0) 



= (1 + o(l)) (j U(z)U T (z)K(z)dz 

= (1 + (1))/»V2 



ftV2 J m (^)if( Z )^ 



.□ 



J m(z)ii"(z)(iz 

J | Z | 3/2 g(z)d2 

Jz 2 K(z)dz 

A. 3. Proof of Theorem [2l We first state some intermediary results. The two following propositions 
deals with the remainder term R„ (/3, e; 0) = X)"=i R* W' e > ^) f rom P-10[) . where 

Ri(Ae;0) 



U(^) (0+^ 



1 Sl (0) T e-ie T [ 4t?J 



(n/i d ) 1/2 



C yi-Q*(X i; 0) 



(n/j d ) 1/2 



K 



Xi - x 



Define also 
(A.9) 



Ri{l3,e;6) = R, (fi, e; 6) + -e T 



ih d ' 



3,(9) (e + 2/3) 



U 



' Xi-x\ 



(nh d ) 



1/2 



-£jY t -Q*(X t ;9)- 



u(££)V 

(nft d ) 



1/2 



-2{I(F, < Q*(Xi-,e))-a} v h ; 



(n/i d ) 



1/2 



■K 



Xi - x 



RjW,e;9) = R i ({3,e;9)-E[R i {p,e;6)\X i } 



R 2 (/?,e;0) = E[i? 4 (/?,e;0)|A 4 ]--e 7 



ih d ' 



3i(0) (e + 2/3) 



(A.10) 
(ATI) 
which are such that 

K„ 08, e; 0) - (/3, e; 0) + R 2 n (J3, e; 0) , R J „ (/3, e; 0) = £ R? (/3, e; 9) , j = 1, 2. 
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Proposition A.l. Consider two real numbers tp,t t > which may depend upon on n with tp > 1, 
t e > 1/n and (tp + t t ) 1 ^ 2 /t € < O ^(nh d ^j / log 1 / 2 n^j . Then, under Assumptions^ and [21 O/nd for 
n large enough, 



E 



sup |Ri(/3,e;0)| 
( / 9,e,e)eB(o,t^)xB(o,t e )xe 1 



(nk d ) 



Proposition A. 2. Consider two real numbers tp,t e > which may depend upon on n with tg > 1 and 
t 8 /t e = o(nh d / log 1 / 2 n). Then, under Assumptions^ Q?| <™d[2l and for n large enough, 



E 



sup R„(/3,e;0) 

(^ I e,e)eB(0,t 3 )xB(0,t E )xe 1 



< C 



te (tp+te)' 



(nh d ) 



1/2 



The next lemma is used to bound the eigenvalues of Yl7=i ^i(^)/( n ^ d ) from below. It implies in 
particular that all the /3 n (0) in (I3.7[) . 9 in 1 , are well defined with a probability tending to 1. Let 7 (#) 
be the smallest eigenvalue of the nonnegative symmetric matrix Y^h=\ 3i{0)/{nh) d ■ 
Lemma A. 2. Under Assumptions^ [7?| and[21 infeee 1 7 ifi) > 7 + op(l) for some 7 > 0. 

Lemma [A.2I together Lemma [A. 31 below gives sup eee i ||/3 n (0)|| = Op (log 1 / 2 nj . 
Lemma A. 3. Suppose that Assumptions \F\ \K\ and\X\ are satisfied. Then 



sup 

ees 1 



i=l 



The rest of the proof of Theorem [5] is divided in two steps. In what follows 

log 3 / 4 n 



in t 



{nhf) 1 / 4 ' 



t > 0. 



Under Assumption iKl (logn)/(nh d ) — o(l) so that t n = o (log 1 / 2 nj . In the sequel, t n will play the role 
of t e whereas tp will be chosen such that tp x log ' n. Hence 

(tp + t e ) 1/2 _ (nh^Hog^n _ 1 ( (nh d ) 1 /* \ 
U ~ ilog 3/4 n t [log^n)' 

tp ^log 1 ' 2 n_ = fn^Y /4 = Jn^ x 1/2 \ = q 

te tlog 3/i n I log n I Itogn 8 i I log 1/2 n / 



Hence these choices of tp and t e satisfy the conditions of Propositions IA.l l and IA.21 provided t is chosen 
large enough. 



Step 1: order o/supfgrngg^Oxe 1 \^n(/3 n (@), e; #)|. Consider 77 > arbitrarily small. Let 7 be as in 
Lemma fA. 2 1 Since Lemmas IA.2I and IA.3I give sup^ggi ||/3n(0)|| = Op (log 1 / 2 "), there is a C v such that, 
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for n large enough, 



sup |K n O3 n (0), e; 0)| > 2^ 

(e,e)6B(0,t„)xB 1 4 



l„(/?„(0),e;0)l > % sup ||/3„(0)|| < ^log^n 



: sup 

\(t,9)^B(^,t n )y.e l 4 see 1 

+pf sup ||/3„(0)|| > C„log 1/2 n 
Veee 1 

< P( sup |Mn(Ae;^)l > ^ ) +!?• 

\(/3,e, 9)6^(0, C^log 1 /2„) xe (o, t „)xe 1 4 / 

Propositions IA.1I and IA.21 R„ = + M 2 and the Markov inequality give 

7tl\ 



sup |K n (/3,e;0)| > 

e 1 

1/2 



.(/3,€,e)eB(o : c rj iog 1 / 2 n)xe(o : t„)xe 1 4 



C 

< 



t„ (c„ log 1/2 n + i„) log 1/2 n t n (a, log 1/2 n + i n ) ' 



t„(nh d )i/±\\ 1 log 1/2 n/ Vn/iV V log 1/2 n 
The definition of t n , t n = o ^log 1 ^ 2 n*j and Assumption [Kl give 

(A.12) limsuppf sup |M„(/3 n (6»),e;6»)| > ^ ] = + O f^— ) when t -> oo. 

«->-°° Y(e,e)e8(o,i„)xe 1 4 y I * y 

Siep 2: sup ege i ||E„ Consider r„ > and e = r„e, ||e|| = 1 so that ||e|| > t„. Since £ a (-) is 
convex, e h- >■ L„(/3(#), e; 0) is convex. This gives since L ra (/?(#), 0; 0) =0 and L„ = L° + M n 

Til 



^L„(/3 n (0),e;0) = —h n (p n (9),e;6)+(l-—)h n (p n (8),0;9) 



> L„ (/3 n (0), ^e;0j = L„ (/3 n (6),t n e;6) 

> L°„ (/3 n (0), i n e; 0) + K„ (/3„(0), t„e; 0) . 
Hence E n (9) = argmin e L„(/3„(0), e; 0) and the latter inequality give 



{||E B (0)|| >t n } C inf L n (/3„(0),e;0)< inf L„(/3„(0), e; i 

U;IN>t» q||e||<tn 

C ( inf L n (/^0), e ;0)<L n (/3 n (0),O;0)=O 
C ( inf [L° (/3„(0),i n e;0)+R„(£ n (0),* n e;0)] <0 



C \ inf L° (p n (9),e;6)- sup |K„ (/3„(0), e; 0)| < 

Ml=*n || £ ||=t„ 
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Since 



this gives 



( sup ||E„(0)|| > t n \ = M {||E n (0)|| > t n } 
Ueei J geei 



(sup ||E n (0)|| >tA C I) J inf h Q n {f3 n {6),e;9)- sup \R n ((3 n (9),e;8)\ < o\ 
Uee 1 J ^ [M=t n ||e||=t„ J 



(A.13) 



C { inf inf L°(/3 n (0),e;0)< sup |M„ (/3 n (0), e; 0)| 
J^eee 1 ||e||=t„ ( e ,e)eB(o : t„)xe 1 

o 



Consider first inf eee i infi| e |i =t L° (j3 n (0), e; 0). The definition (|3.10[) of L° gives, for any e with ||e|| = t n , 



L° (Pn(0),e;O) 
Hence (|A.13|1 . Lemma IA~2l and (|A.12|) give 



2 ynh d ) ~ 2 1 ™ 



limsupP sup ||E„(6>)| > t n I < limsup 



sup 



ri-s-oo V (e,e)e8(0,t„)xe 1 



< limsupP sup 

»->oo \_(e,e)eB(o,t„)xe 1 

= r\ + I — - — I when t — >• oo 



7 



Since the latter can be made arbitrarily small by taking r\ arbitrarily small and then t large enough, the 
Theorem is proved. □ 



A. 4. Proof of Corollary [TJ Part (i) follows from Theorems Q] and [5] and the triangular inequality, 
together with 



l/m 

\\p n (a;h,x)\\ m dx) =Op(1) 



We now prove the latter. Lemma lA.21 and the Holder inequality give, since Xq is compact, 

2[m]+2 \ V(2[m]+2) 



\\/3 n (a; h,x)\\ m dx 







) 1 -Op 


7 







dx 



Since E[Sj(0)] = 0, the Marcinkiewicz-Zygmund inequality (see Chow and Teicher, 2003), (|3.5[) and 
h d > C (log n) /n give 



E l/(2[m]+2) 



< C 



1 



2[m]+2" 



< (7E 1/(2[ml+2) 



1 " 



m +1 



2 = 1 



(n/i d )H+i 



)»[m]+l =1 



X, 



1/2 



G /C 



0(1), 
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uniformly in x. Part (ii) similarly follows from Lemmas IA.2I and IA.3I which gives sup^gQi ||/3 n (0)|| = 
Or (log 1/2 nj . □ 

A. 5. Proof of Proposition [2j Let h = h n /C and h = Ch n . The condition on h n ensures that h and h 
satisfy Assumption iKl for all C > 1. Recall that Lemma [22] together Lemma [A.3I gives sup eee i ||/3„(#)|| = 
Op (log 1/2 n). Hence ([O]) , Theorems □ and [U give, for all C > 1, 



sup 

(a,x,h)e\a,a]xX x \h,h] 



b v (a; h, x) — b v (a\x) 



= hr Wl O r \h s 
= h-^Or I hi 



nh 

logro^ 
~nhj 

This ends the proof of the Proposition since liminf„_i. 00 P [h n G [h. h]J can be made arbitrarily close to 
1 by increasing C. □ 

A. 6. Proof of Proposition [3J Substituting ([5^]) in ([XTT]) yields 
g(a|a;) — g(a|x) = — A Q(a + h q t\x)dK q (t) — q(a\x) 

hq J 

+ Y J (Q*(a + h q t\x)-Q(a + h q t\x))dK q (t) 



eQ/3 n (a + h q t;h,x) f e„ E„ (a + h q t; h, x) 



,. dg g (t)+ / u " v ' ' dK q (t). 

h q (nh d ) 1/2 J h q (nh d ) 1/2 

Theorems [T] and [5] with h = 0(h q ) and h q — > give 



E n (a + h q t;h,x) log 3/4 n 

W = "777 Op 



h q (nh d ) 1/2 " ' (nh'hl) 1 '* \(nh q h d ) 1/2 , 



Hence it remains to show that 



(A.14) J- / Q(a + h q t\x)dK q (t)-q(a\x) = O (h 3 q ) , 



hq 



(A.15) [ (3 n (a + h q t;h,x)dK q (t) = P (1). 

h q ' J 

The two next steps establish these two equalities. 

Step 1: proof of JXgp . Let gO')(a|af) = & q(a\x) / da? . Since Q(a|af) € C(i,s + 1), the Taylor- 
Lagrange Formula gives, for some uj in [0,1], 
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The definition of the smoothness class C q (L, s) gives 

g<W>(a + uh g t\x) - q ([si \a\x) < L \h q t\ s ~ lsi . 

Hence, since the support of K q (-) is compact, j \dK q (t)\ < oo and J dK q (t) — 0, J tdK q (t) = 1, 
J t 2 dK q {t) = ■■■ = J t^dK q {t) = 0, 

^ f Q(a + h q t\x)dK q (t) = f dK q (t) + q(a\x) f tdK q (t) + ^S^M. f t 2 dK q (t) 



h l q l q^\a\x) 

+ "' + (W+i) 
?(a|a?) + 0(/i s ). 



i w dK 3 (i) + 0(ft s 



Step 2: proof of [XJ~5\) . Let d t = (a + h q t, h, x), 9 = 6 . Since / dK q {t) = 0, dHU) gives 
1 f „ 1 



/4 /2 



p n (6 t )dK q (t) 



1/2 



(Pn(Pt)-Pn (0))dK q (t) 



(A.16) 



(A.17) 



(0)-Sj (0 t )}dK,(f). 



Since A 4 A 1 is Lipshitz over the set of semi-definite positive matrices A with smallest eigenvalue 
bounded from below by 7, Lemmas IA.2I and A. 3, p. 61) and Assumption [F] yield that (|A.16I) satisfies 



< 



< 



Qp(1) 
1/2 



idif 9 (*)i 



Op (log n)Va /■ l " 



1/2 
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The definition (13.41) of Q*(X;0) and (|A.8|) give, since Q(a\x) G C(L,s + 1) and because the support of 
K q is compact, 

' Xi - x 



i=l 

i r i 



e £ |dK,(t)| 



E 



u 



< 



a <3 17 i=l 



Xj~ X 
h 

€ K 



(Hb*(0 t ) -Hb*(0)) 



X,; - .X 



€/C |dK,(f)| 



1/2 



\Hh*{e t )-nh*{6)\\ \dK q (t)\ 



< Pp(1)^ C /" ||Hb(a + Vk) - Hb 



(a|i)|| |dJ<r,(t)| +0(h 



s+1- 



< a 



■W^TTs (/ IG ( a + Ml*) - Q ( a \ x )\ \ dK i(t)\ + ( h ' s+1 + = Op (h\ /2 ) 



This gives that the item in ()A.16j) is Op (ji q ^ 2 ^j = op(l). 

For ((S32|,LemmaK2lE[Si(et)] =0, (JHHJ), Q*(X; t ) = Q*(X; 0)+O(h q ) uniformly with respect to 
t in the support of K q and l£i + hIC (as easily seen arguing as in the equation above) and Assumptions 

EE] give 



< 



< 



Qp(i) 
/4 /2 

Qp(i) 
/4 /2 

Qp(i) 
/4 /2 

OKI) 

;i/2 



1 " 

-^^{S, (6)-S i (6 t )}dK q (t) 
(nh d ) i=1 

/ 
— -— ^{s.^-s,^)} W)l 
(nft d ) i=1 

- i n 

J (nh ) i=1 



E 



pl/2 



1 ™ 

-^^{S, (9)-S l (9 t )} 



(nh d ) 1/2 



i=l 



\dK q {t)\ 



l 



T (x+hz-9)+Ch q 



'(x+hz;8)-Ch q 



f(y\x + hz)dy I (z G /C) /(a; + /iz)dz 



- Op(1) 
= O p (1).D 

Appendix B: Proofs of intermediary results 
B.l. Proof of Lemma IA. 11 Recall 



1/2 



\dK q {t)\ 



Upf - x) T b = U(X - a;) T H- 1 B = U 



X -x 
h 



B 



and define 



£(B; 0) = £ (b; 0) = ^E [{l a (Y - U((X - x)/h) T B) - £ a (Y)} K h {X - x)] 



27 



The change of variable x% = x + hz gives 



£(B; 



1 

1? 



£ a (y-U 



Xi — X 



B) 



Za{y) f(y\xi)dy 



f(xi)K 



X\ — X 



dx\ 



(B.l) 



(l a (y - U (zf B) - t a (y)) f(y\x + hz)dy 



f(x + hz)K (z) dz, 



showing that C (B; 9) is also defined for h — 0. 

Proof of (i). It is sufficient to show that B*((9) = argmin BgR p £(B; 0) exists and is unique. Note that 
B i — ^ C(B;9) is convex by (|B.1[) because is convex. Since lim| t |^ +oc £ a (t) = +oo and U(z) T B di- 
verges almost everywhere when ||B|| diverges, (|B.1[) gives that lim|| B ||^ +00 C(B; 9) = +oo. Hence C(B; 9) 
has a minimum. We show that this minimum is unique by showing that B i— > £(B; #) is strictly convex 
for all 9 in 0°. We compute the first and second B-derivatives of £(B;#). Equation (|1.2p gives that for 
almost all B, 

d£ a (y - U(z) T B) 



dB 1 



= 2(1 (y<U(z) T B) -a)U(z) 



which is bounded for z in the compact /C. Assumptions |FI iKl and IXl the Lebesgue Dominated Convergence 
Theorem and (jB.lj) yield that 
dC(B; ( 



£«(B; 



<9B T 



= 2 



(I (y < U(z) T B) - a) f(y\x + hz)dy f(x + hz)V(z)K(z)dz 



(B.2) 



2 / F (U (,f B|. + /(, + M U(^(^ - 2a / / (, + hz) U(z)K (z) dz. 



Applying again the Dominated Convergence Theorem yields that 



(B.3) 



3 2 \B- 



d 2 £(B; 



2 / f(\J(z) T B\x + hz)f(x + hz)V(z)V(zfK(z)dz. 



dB T dB 

For all A ^ in R p , pO]) , Assumptions E EH and a; G Af give 



A T £ (2) (B;6»)A = 2^ f(U(zfB\x + hz)f(x + hz)A T XJ(z)U(zfAK(z)dz 
(B.4) = 2 y / (TJ(zfB\x + hz) f (x + hz) V (z f A 2 K (z)dz > 0. 

Hence 0- 2 '(-;9) is a positive definite symmetric matrix for all 9 in 0° and B in R p so that the strictly 
convex function £(B; 9) achieves it minimum for a unique B*(9). 

Proof of (ii). Consider a fixed h to be chosen small enough, and let 0° be the corresponding 6°, 
which is compact. The proof of (i) yields that B*(9) is unique for all 9 in 0° and is the unique solution 
of the first-order condition £W(B; 9) = 0, that is 

F (U (,f B|. + to) /(, + hz)U{z)K{z)dz = a f f(x + hz) U{z)K (2) da, 



(B.5) 

see (|B.2|) . so that (|A.1|) is proved. In particular, B*(a;0,x) is the unique solution of £W (B; a, 0, a;) = 0. 
If ft, = 0, the first order condition (IA.1I) is equivalent to 



F(V(zYB*(a;0,x)\x)V(z)K(z)dz = a / V(z)K(z)dz 
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Let Bq (a\x) — (Q(a\x), 0, . . . ,0) in M p . Since XJ(z) T H (a\x) = Q(a\x), B (a|x) satisfies the first-order 
condition equation above. Hence B*(a;0,x) = Bo(a|a;) by uniqueness. 

We now show that B*(0) is continuously differentiable in over 0° and give bounds for B*(0), 
dCW(B*(6);6)/dd T and £( 2 )(B*(0); 0). As shown above, B i-> ^(B;^ is continuously differentiable 
and £ ( - 2 - ) (B; 0) is a symmetric positive definite matrix for all B in R p and so has an inverse. Assumptions 
IF1 iKl and |X] yield that F(TJ(z) T Ti\x + hz) and f(x + hz) are bounded and have bounded 0-partial 
derivatives over 0° provided h is small enough. Hence the Dominated Convergence Theorem and (|B.2[) 
yield that £W(B;0) is continuously differentiable in over 0°. Then the Implicit Function Theorem 
(see e.g. Zeidler (1985), p.130) and the first-order condition C^(B*(9);9) = yields that B*(0) is 
continuously differentiable in 9 over 0° , with 

dB *(°) r/?ro,T«™ ^-^^(B*^)^) 



d9 T 



(B.6) ^^ = -[£( 2 )(B*(0); 

Recall now that 0° C 0° when h tends to 0. Hence continuity of B*(-), d£^(-, -)/d9 T and compactness 
of 0° give 



j_im sup 



fim sup ||B*(0) -B*(a;0,x)|| 
d£^ (B*(0); 0) 9£W(B*(a; 0, x); a, 0, a 



0. 



= 0. 



(R7) - 90 T iM 1 ' 

Since the first limit is (|A.2[) . (ii) is proved. 

Proof of (Hi). We bound the partial derivative (|B.6|I . Observe that (|A.2jl . the expression of B*(a; 0, x), 
the compactness of 0° and Assumption IF1 yield that there is a compact B such that B*(0) is in B for all 
in 0°, provided h is small enough. Then (IB.3|) and (|B.4I) give that uniformly in in 0°, 



£ (2) (B*(0); 



>- C 



B(0,1) 



\5{z)\5{z) T dz. 



Hence (|B.6j) and (|B.7j) give 
<9B*(0) 



(B 



Jim sup 

h^oeee 



d9 T 



< C 



8(0,1) 



U(.z)U(z) T dz 







r 


lim sup 







aC«(B*(0);( 



90 T 



< C. 



Let us now return to the proof of (iii). The differentiability results above yield that € O 1 i— > 
Q*(x'; 9) — U((x— x')/h) T B*(9) is continuously differentiable in 0. We have for all a;, a;' in A" and h > h, 



U 



< 



g 

90^ 



U 



< 



c 



/i' J 



+1 • 



Hence for h small enough, (|A.2[) and (|B.8[) yield that for all in 1 and x' in A, 



dQ*{x';9) 



d9 T 



< 



d 
W 

d 



U 



B*(0) +U 



d9 T 



u 



:|B*(0)|| 



U 



dB*{9) 



d9 T 
dB*{9) 



89 T 



< Ch- p (1 + h' 1 ) 



The Taylor inequality shows that (iii) is proved. 
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Proof of (iv). The change of variable x' = x + hz shows that it is sufficient to prove that, for all 9 
in 0° and z in JC, 

f(Q*(x + hz; 6)\x + hz)>C with f(Q*(x + hz; 9)\x + hz) = f(U(z) T B*(9)\x + hz), 



which is true for h small enough by (|A.2[) and under Assumption IF1 which gives that f{y\x) > C > for 
y in any compact subset of R and any x in Xq . □ 



B.2. Proof of Proposition [ATll The proof of the Proposition uses the two following Lemmas. In what 
follows, the stochastic processes R(-; ■), R 1 (-; •) and R 2 (-; •) have the same distribution than the Ri(-; ■), 
R 1 (-; •) and R 2 (-; •) in ([0]) . (TA~TO| and (|A~TTj) . Define also 



(B.9) 



W) = u 



X-x 







h J {nh d f /2 ' 
Lemma B.l. Under Assumptions\F\ Q?| we have 



Var(i?(/3,e;6»)) < C 



kl| 2 (||/3||+||e||) 



, {nh d ) 



1/2 



Proof of Lemma Ell Observe £ a (t) = 2 / Q (a - I(z < 0))cte. Hence (jAj]) and (jR9)) yield 
(B.10) 

The Cauchy-Schwarz inequality give 



8(f3,8)+S(e,9) 

R(p,e;9) = 2K h (X-x) / < Q*(X; 9) + i) - I (Y < Q*(X; 9))) dt. 

<5(/3,0) 



i?(/3,e;0) 2 = ^(X-x) 5 



8(13,9) 



(I (y < Q*{X; 6) + t) - I (Y < Q*(X; 0))) dt 



< AK h (X-xf\6(e,6 

< 4K h (X -x) 2 \S(e,6 
Hence Assumption [Fl and (|B.9|) give 

E[R 2 ((3,e;9)\X] < 4K h (X - xf \S(e, 



5(13,9) 
8(j3,9)+8(t,9) 



(I (y < Q*(X; 0)+t)-I(y< Q*(X; 9))f dt 
1{\Y -Q*{X;9)\ < \t\)dt 



8(13,9) 



8(/3,e)+S(e,9) 
8(13,9) 



< AK h {X-x) 2 \\f{-\-)\\ x \5{e,9)\ 



I(\y-Q*(X;9)\ < \t\) f(y\X)dy } dt 

8(/3,9)+S(e,9) 



\t\dt 



8(J3,9) 



< CK h (X - xf 6(e, 8) 2 Q6(J3, 8)\ + \5(e, 8)\) 

K 2 ( X-x \ IItt ( X-x \ll 3 

< c 1 y " 3/2 h j " lkll 2 (ll/3|l + lkll). 

(jih a ) 
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Then, under Assumptions [K] and [Xj 
Vax(R(P,e;9)) < E[R 2 (/3, e; 9)] = E [E[i? 2 (/3, e; 0) |X]] 



< 



CNI 2 (||/3|| + NI) 



(n/i d ) 



3/2 



u 



fx(x')dx' 



(n/i d ) 



3/2 



if 2 (z)||U(z)|| 3 /x(a; + ^)da;' < C- 



lfii_M!L±J!f!!) 

,1/2 



rt (nh d ) 
Define 

T = T (tp^Q 1 ) ={R(/3, e; 0) , (/3, e, 0) € 6(0, tp) x 6(0, t e ) x 9 1 } . 

The next lemma studies covering s of T with brackets [R, R] . Recall that the bracket [R, R] = [R(X, Y), R(X, Y)] 
is the set of random variables r = r(X, Y) such that R < r < R almost surely. 

Lemma B.2. Under Assumptions^ Q?| an<i[2] and if tp + t e > 1 and n is large enough, 
(i) There are some ef 2 and W, with 

_ 2 t 2 (t e +tp) _ tp+t e 

<y x — — -7— — , w ~ 



n{nh d )V 2 ' (nh d ) 1 / 2, 
such that for all integer number k > 2, (/3, e, 6') m 6(0, ^) x 6(0, t e ) x O , 

fc! 



E 



\R(f3,e;6)-E[R(j3,e;6)}\* 



< - r w fc " 2 a 2 . 



(ii) Le£ t in (0, 1) &e a bracket length. There is an set of brackets 1 T = { [R jjT , R J:T ] , 1 < j < e H(T) } 



such that 



l<j<e H M 

, 1 £,| 

i?^ T — Rj,r\ < — W k ~' 2 T 2 for all integer number k > 2 and j in [l, e^ 7 "-* 



E 



ff(r) < Clog 



n(tp + t e ) 



for all t, tp and t e 



Proof of Lemma IB.2I Define for fj in R p 
R{p;6)=2K h (X-x) 



(I(Y < Q*(X;0) +u) -l(Y < Q*(X;6)))du. 



Let sgn(t) = I(t > 0) - I(i < 0). Observe that i?(/3; 0) > with 



i?(/3;0) = 2K h {X-x) 



\l (Y < Q*{X- 0) + sgn(<5(/3, $))u) - 1 (Y < Q*(X; 0))| du 



2K h {X-x)\8{t3,e)\ / |I(y<Q*(X;0) + 5(A0)«)-I(y<Q*(X;0))|dv 



(B.ll) = 2K h {X - x)\8(P,6)\ / E(F-Q*(X;0) lies between and 5{P,6)v)dv. 

Jo 

(iRTOll and 0) + <J(e, 0) = <5(/3 + e, 0) give 

(B.12) R{P;e,6) = R(fi + e',0)-R(P;6). 
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It also follows from (|B.9I) and Assumption iKl that for all (3 in B(0,tg + t e ) and all in 6 1 

' X -x' 



(B.13) 



R(fi]9) 



< 2 



U 



K 



X-x\ W _ tp+t 

' < — . w x — ■ 



(nh d ) 



1/2 - 2 ' 



Uh d ) 



1/2' 



Part (i) follows from Lemma IB . II and (|B.12|) which give 



E 



\R(P, e; 0) - E [R(0, e; 



E 



i?(/3 + e;0)-E + e; 0) - (fl(/3; 0) - E R(/3;0)) 2 e; 0) - E [i?(/3 



< |2X- 



_\ fc-2 



Var (i?(/3, e; 0)) < w^o 2 



The proof of part (ii) will be divided in three steps. Let T t be {R(fi; 0), (/3, 0) € B(0, i) x 6 1 } . For the 
sake of brevity we abbreviate i? • T , Rj T into i? ■, 

Siep 1 : Coverings of T and Tt, t = tp + t e > 1. Wc show in this step that it is sufficient to find a 
covering of Tt with H(r) = H(r; t) brackets satisfying 



E 



\£j ~ R 3 I 



< 



fc! 



(B.14) 

(B.15) H(t) < Clog 

Indeed, consider two such coverings of Ttp and Tt.+t*, 

, T tp+u c 



_\ fc-2 

2" 



l<j<e ff i< T ) 



u 

l<j<e H 2M 



d1 "d 1 

—ji ' ii 



and i?(/3 + e; 0) e 



#i(t) < H 2 (t) = H(r,tp + t e ). Consider a R{/3, e; 0) in J". Since i?(/3;0) £ 

, for some j x and j 2 , (|B7T2|) implies that e; 0) £ - , - R] t . Hence these e H '^ 
brackets form a covering of T with, using (|B.14I) and (|B.15|) . 



E 



£h ~ ( R h - &h 



< 2 



fc-i 



E 



R h - Sih 



R 3 i ~ 3*1 



.fc! 



_\ fe-2 



fc! 



8 V 2 

nfo + *<=) 



T 2 = ^ W k-2 T 2 

2 



< 2 K - - 

8 

H\t) = iJi(r) + # 2 (t) < Clog 



Step 2: Preliminary results for the construction of a covering of Tt ■ We bound the increments of 
([3,0) i — ^ Q*(X;9),K h (X - x),5(/3,9). Lemma EjJ(iii) gives that for all 0, 0' in 6 1 

|Q* [X; 9) - Q* {X- 9')\ < Ch- p {l + h- 1 ) \\9 - 9'\\ . 



Under Assumption [Kl 



:',2 



For the increments of 8{(3, 9), define U = U(X - x), U' = \5{X - x'), H' = U(h'). This gives 
\8{fi,9)-&{fi'J)\ 

- " r 1 / H 1 H' 1 \ 



U T ^L (/? - /3') + (U' - U) T -^^P' + \J' T 



H' 



C 



< 



(nh d ) 
H 1 



1/2 



(n/i rf ) 1/2 
C(l +t) 

1/2 



||/3 -/3'|| + |hr-z'|| 



nh d ) 1/2 
H 1 



1/2 



ll/3'll+C||/3'| 



(nh d ) 1/2 (nh /d ) 1/2 

H 1 H' 1 



(n/i d ) 1/2 {nh' d ) 



1/2 



f - + |k - + I |ft - < - /3'|| + || 



hP (nh d ^j 

Step 3 : Construction of the covering of Ft- Dehne 



no- 



r(q,8) 
Hence (jB.l ip shows 



p(q,8) = \I(q<S)-I(q<0)\=I(qe(p,S\)U5>0)+I(qe[5 ) 0))I(5<0).. 
l 

p(q, Sv)dv. 



i?(/3; 9) = 2K h (X - x)\S(P, 9)\r (Y - Q*(X; 9), 8 (/3, 9)) . 



For any -q > 0, there exists functions p(q, 8) = p^iq, 8) and p(q, 6) — p n (q, 8) and an open set D = D v C K 2 
such that 

p-(i) < p(q, 8) < p(q, V ) < p(q, 8) < 1 for all (q, 8), with p(q, 8) = p(q, r,) = p(q, 8) if (q, 8)eR 2 \ D n , 



p-Qi) sup MeDri ( 



dp(q,8) 


+ 


dp(q,6) 


+ 


9p(q,5) 


+ 


ap( q ,s) 




dq 




dS 




dq 




dS 


) 



p - (iii) D C D' = {(q, 8) € K 2 ; \q\ < CV 1/2 or \q - 8\ < C^ 1 / 2 } . 
Define r(q, 8) = J Q p(q, v8)dv, r{q, 8) — J ~p(q, v8)dv and 

£08,0) = 2K h (X-x)\S((3 7 9)\r(Y-Q*(X;9),8((3,9)), 
R{p,9) = 2K h (X-x)\8(p,6)\r(Y-Q*(X;9),8(p,6)). 
Since K(-) > 0, p-(i) gives that these functions are such that 
(B.16) R{P,0) <R(P,9) <R(J3,9). 

We now bound R(0, 9) - R(fi' , 9') and i?(/3, 9) - R(f3' , 9'). We have 

W, 0) ~ R(/3', 6')\<2 \K h (X - x) - K h , (X - x')\ \8(p, 9)\r (Y - Q*(X- 9), 8 (/3, 9)) 
+2K h , (X - x')\S{p, 9) - 6(J3', 9')\r (Y - Q*(X; 9), 8 (/3, 9)) 

+2K h ,(X - x')\5(P', 9')\ \r_ (Y - Q*(X; 9), 8 (/3, 9)) - r(Y - Q*(X; 9'), 8 (/?', 9'))\ 
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Hence Step 1, p-(i,u), (153)) and the Taylor inequality give for all (0,6), (j3',9') in B(0,t) x 9 1 , 

t \\6-6'\\ 1 + t 



\R(P,e)-R(p',9')\ < C 



h p (nh d y/ 2 h 2 h p+1 (nh d y/ 2 



(\\9-6'\\ + \\P-P'\\) 



■ c 



+ 



1 + t 



}j P+l /jP+l( n ^)l/2 
(l+T?- 1 ^) (1+t) 



(||0-0'|| + ||/3-/3'||) 



Arguing symmetrically gives 

\R{P,9)-R(P',9')\ < C 



h p+ \nh d )V 2 

(l + r ? - 1 /2)(i + t ) 



(\\9-9>\\ + \\p -pW) 



(||0-0'|| + ||/3-/3'||). 



h p+1 (nh d ) 1 / 2 

We now construct the brackets. Recall that there is a covering of B(0, i)x9' with N balls B ((Pj,9j), rj), 
9j = (otj,hj,Xj), with center (f3j,9j) and radius rj such that 

(B.17) - C '" 



N < max 1 



,P+d+2 



see van de Geer (1999, p. 20). Define 



(l + fT 1/2 ) (1 + t) -/ -, s (1 + 77- 1 / 2 ) (1 + t) 

g^RifiiA)- ^ ' u l d \,Z \ R j = R(/3 j ,9 j ) + Cr 1 ± 



(B.18) 



h p+l (nh d y/ 2 



Rj = max (0, 3-j) , Rj = min [ — , R 



h p+1 (nh d yl 2 



Bounding R(J3, 9) - Rj and R(f3, 9) - R 3 for ((3, 9) in B {(j3j, 9j), rj), CEE} and (lBA3l) give 



(B.19) 



3a < 3j < R(P, 0) < R 3 < Rj 



It then follows that { [Rj, Rj] ,j = l,..., iV} is a covering of ft with, since < Rj < Rj < w/2, 



(B.20) 



I Rj — R j | 5: ~ ^ " ' 



(nh d ) 



1/2" 
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We now bound E 



give 



E 



(R j -R j 



< E 



and E 



< 2E 



Rj - Rj | 



• (|Bl9j) . p-(i,iii), ([R9]) and Assumptions [0 [H 



Crj- 



(1 + 7T 1/2 ) 2 (1 + *) 2 



< 8E 



< 



Kl. (X - x 3 ) S 2 (^,dj) (r (Y - Q* (X; , <5 (ft-, 9j)) - r(Y - Q* (X; 9j) , S 



„ (1 + *) / 2 \ 



< 81 



Kl(X-x 3 )S 2 (^,e 3 ) J Qf l((y - Q*(X;9 3 ),vS (P 3 ,0 3 )) e D)dv^j f(y\X)dy 



s^ll^ll 2 f , 

-^-/jfwiuwr 



- + hjZ;6j),vS (Pj,6j)) e D)dvf(y\x j + h 3 z)dy 



This together with (1B.20[) give for any integer number k > 2 



f(xj + hjz)dz 



E 



fc-2 



E 



{Rj-Rj) 



< 



k\ fw\ k 2 (l+tf / 2 1/2 



8 V 2 



Hence (|B.14|) holds if 77 satisfies 



C . 
77 = — mm 
' 3 



'n^+ 2+d \ 1/2 ri/^+ 2+d fnh 2p+2 +^ 



{i + ty j (i + ty \ (i + ty j 

Recall now that r < 1, t > 1 and that /j, > Cn -1 /'' under Assumption [0 The bound (TB.17|) for 
A = exp(iJ(r)) gives taking 77 as above 

/ \ 



1, 



Ct 1 



\ 



h- 1 l \ 1/2 nh 2 P+ 2 + d 2 / nh 2 P+ 2 + d \ 2 4 



P+d+2 



P+d+2 



< max I 1 , I 

It then follows for n large enough 

H(t) < (P + d + 2)max f 0,log f J J = C ( 3 log t + log ;n - log r J < Clog ( — 

and (|B.15|) is proved. This ends the proof of the Lemma. 
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Let us now return to the proof of Proposition lA.il Define X = (Xi, • ■ ■ , X n ). The definition of 
and (|A.10|) give 



E 



sup 

(/3,£,e)GB(o,t )xe(o,t E )xe 1 



E 



< E 



sup 

(£,e,0)££(O,t, 9 )x£(O,t, ! )xe 1 



sup 



+E 



< 21 



sup 

(/3,e,e)eB(0,t f) )xB(0,t £ )xe 1 



sup 

(^,e,e)GB(0,t^)xB(0,t e )xe 1 



i=l 
n 

Tl 

W,e;6)-®[Ri W,e;9)])\X 



E 



{j3,e;6)-E[R i (l3,e ] e)]) 



Let H(-), a and w be as in Lemma [B. 21 Recall that tp + t e > 1 and that a < 1 < n(t^ + t e ) for n large 
enough under the assumptions for tp and t e of the Proposition. It follows from Massart (2007, Theorem 
6.8) that 



E 



sup 

(/3,£,e)GB(0,tfj)xB(0,t £ )xe 1 



< C ?i 1/2 / H{u) 1/2 du +(w+W)H (a) 



Since a < 1, Lemma fB.21 gives . for all it in (0,cr], H(u) < C\og(n(tp + t e )/u). This gives 

1/2 / n7r , . . ^ \ ' - 

\ II l" ttI II, \ ■. - / _,l/9 ( /"„/>, 1 

n 



i 1 ' 2 £ H^ 2 (u)du < {nW) i/2 ^£ H{u)du j < ^(mx) 1 / 2 NT CT log (^±±±A 

= C{naf' 2 (a (log + ^ < Cn^log 1 / 2 

The order for cf given in Lemma lB.21 assumption on tp + t e and Assumption [Kl give 



du j 

(tp + t e )n 



log (n(t^ + t e )/a)<Clog 

Substituting gives 
E sup 

( / 3,e,6»)eB(0,t 3 )xB(0,t e )xe 1 



V 



,3/2 



< Clog 



1/2 > 



los 1/2 n 



< Clogn. 



nh d 



( 



1+log 1 / 2 



< C (n 1/2 alog 1/2 n + {a + w) logrc) 
1 , (tp + U) 1 ' 2 



\ 



lV2 



(V) 



1/4 



B.3. Proof of Proposition [AT2l The proof of Proposition IA.2I follows the same steps of the proof of 
Proposition I A. 1 1 and we only sketch it. The integral expression of R([3,e;9) in (|B.10I) and the expression 
flSTQl of B. 2 (/3,e;6) give 



R 2 (/3,e;0) = 2K h (X - x) 



&{P,6) 



(F (Q* (X; 8) + u\X) - F (Q* (X; 6) \X)) du-— d e J J(0)(e+2/?). 
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The definition (|3.6[) of 3(9) gives 
R 2 (f3,e;9) 



2K h (X - x) 



2K h (X - x) 



5(P,6) 
5{P,6)+5{e,e) 



(F (Q* (X; 9) + u\X) - F (Q* (X; 9) \X) - uf (Q* (X; 9) \X)) du 



6(13,0) 



u\ / (f(Q*(X-6)+vu\X)-f(Q*(X-9)\X))dv\du 



HP, 6) 

r(/3; 9) = 2K h (X - x) I ui I (f (Q* (X; 9) + vu\X) - f (Q* (X; 9) \X)) dv j> du 



Define 



which is such that R 2 (/3, e; 9) = r((3 + e; 6>) — r(/3; 9). Since \f(q + v\x) — f(q\x)\ < L \v\ under Assumption 
E (EU) gives 

/■5Q9,0)+5(e,0) 
h{P,6) 

'x-x\ IMKII >\\ + \\<\\y 



|R 2 (/3,e;0)| < K h (X - x)L Q 



(B.21) 



C 



U 



X-x 



2 du 

3 / v \ II II /ii on . ii n\2 



<CK h (X~x)\S(e,9)\(\S((3,9)\ +\S(e,9)\f 



A 



(nh d ) 



\r(f3;9)\ < CK h (X - x)\8((3,9)\ 3 <C 



U 



X-x 



3/2 
3/2 



K 



X-x\ ||/3| 



(nh d ) 



3/2 ' 



The latter inequality gives for all j3 in B(0, tp + t e ) and all in O 



It follows from pOT]) that, for all ((3, e) in B(0,^) x £(0,i e ), 
Var(R 2 (/3,e;0)) <E[R 2 (/3,e;0) 2 ] 



- C 



kll(ll/?ll + lklir 

(nh d f /2 



u 



lkl| 2 (||/3|| + ||e||)\ d 



(nh d Y 



h d I ||U(z)|| 4 A 2 (z)dz < (W'Y, a 



-^2 



f(x')dx' 



_, u (tp + uy 



i 3 / 2 h d 



Then constructing brackets as in Lemma IB. 21 and arguing as in the proof of Proposition IA.1I give 



E 



sup 

(/3,e,e)es(o,^)xz?(o,t £ )xe 1 



[((3,e;9)-E[Rl((3,e;9)]\ 



< nV V log 1 / 2 (2&+*s) ) + (o> + W) log \ . 

Since (15311) yields for all (/3,e,9) e B(0,i^) x S(0,i e ) x 9 1 



|E [R 2 (/3,e;6»)]| = |nE [R 2 (/3, e; 0)] | < CnE 



U 



A' 



X-x\ lie 



(n/i d ) 



3/2 



c 



U (te+tpY 



(nh d ) 



1/2 ' 
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substituting gives, using tp > l,tp/t e = O (nh d / log 1/2 nj and AssumptionlKlwhich ensures log (n 5 ' 2 h d /t)j 
O(logn), 



E 



sup 

( / 9,e,6»)GB(0,t /3 )xB(0,t E )xe 1 



< 



< C 



sup 

(^,e,e)GB(0,t /J )xB(0,t e )xe 1 



u (t/3 + uy 

nh d 



tp+t t 



{ \S*{/3, e; 0) — E [R 2 (/3, e; 9)]\+E [R 2 n (p, e; 9)} } 
n 5 / 2 h d ' 



1/2 



log 1 / 2 



*e (t/3 + t e ) 



1/2 



B.4. Proof of Lemma IA.21 Lemma [ATT] (iv) and Assumptions [Kl and [Fl give that there is a C > such 



that for all 9 in 1 and all i, 

J, (0) >- CM, (0) , M s (9) = 2K h {X t - x) U 
Hence for all in 1 , 

i n ^ n 

(B.22) 53 j 4 W 53 m, (0) = Mn (0) 



— a; \ _j / — a; 



The entries of M„ (0) write 



C 



— Y 

nh d ^ 



Xi 



h 



Vl+V 2 



K 



Let M(9) be the matrix with entries 



h d 



X-x 



Vi+V 2 



K 



Xi - x 



X -x 



,0< |vi|,|v 2 | <p. 



,0< |vi|,|v 2 | <p. 



Arguing as in the proof of Proposition IA.1I for each of the entries of M„ (9) gives 

sup ||M„(0)-M(0)|| =0,(1). 
see 1 

Assumptions [Kl IF1 and IXl give, for all u in ffi. p , all x in Xq and h small enough, 



Q 

u T Af(#)u = j-jE u Vl u V2 

0<|v 1 |,|v 2 |<p 



Vi+V 2 



K 



X -x 



C 5Z u Vl u V2 f z vl+V2 K(z)f(x + hz)dz = C f I "v^ K(z)f(x + hz)dz 

0<|vi|,|vj|<p \0<|v|<p / 



C 



53 v v ds>C||u|] 

B( °' 1) \0<|v|<p 



where the last bound uses the fact that 

/ 

u l-» 

V 



1/2 



8(0,1) 



53 Uv-£ v 

v 0<|v|<p 
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is a norm and that norms over M. p are equivalent. Hence ([532)) and ||M„(0) - M{6)\\ = o T (1) yield that 



there is a 7 > such that inf eg 0i 7 (0) > inf eee i inf|| u || =1 u T M n (0)u > 7 + op(l) 



□ 



B.5. Proof of Lemma IA.3i The first order condition (|A.1[) implies that E[Sj(0)] = 0. Consider the v 
coordinate of S<((9), 

S v>i (0) = 2 {I (Yi < Q*(X i; 9)) -a}( ^f.) K ( A ' ' 



Hence Assumptions [Kl and IXl give, uniformly in 6 € G 1 and for all i, 



Sv,i(0) 



Var 



(n/i d ) 1/2 
\nh d ) 1/2 j 



< w", w" X (nh^j 



< E 



Sy,j( g) 

(nh d ) 



1/2 



-1/2 



< E 



(n/z d ) 



1/2 



Hence arguing as in the proof of Proposition lA.il gives, under Assumption iKl 
1 n 1 

i— £ S V)i (0) = O (nV V log 1 / 2 n + (a" + IB") log 1 



sup 

see 1 



The Markov inequality then shows that the Lemma is proved 



^ =<)(log ,/2 n) 



n 



□ 



